Lesson: Using Data Cleanse Transforms 


Recognize specific phrases 


Some words can be used in both firm names and job titles. As a result, Data Cleanse may 
incorrectly recognize some job titles as firm names. To improve parsing, you can add these 
job title phrases to the dictionary. 


Identify firm names containing personal names 


Firm names are often made up of personal names. As a result, Data Cleanse may incorrectly 
parse the firm as a personal name. For example, the catalog retailer J. Crew may be parsed as 
a personal name rather than as a firm. To improve parsing, you can add multiple-word firm 
names to the dictionary. For example, to parse J. Crew as a firm rather than as a personal 
name, add J. and Crew to the dictionary with the Firm_Name classification, and J. Crew with 
the Firm_Name_Alone classification. 


Changes to structure require that you upgrade all Data Services 3.2 Data Cleanse dictionaries 
to cleansing packages of the current version. To execute successfully, a Data Cleanse job 
must reference a cleansing package. The cleansing package may be either an SAP-supplied 
cleansing package or a cleansing package that you have modified and published in the 
Cleansing Package Builder module of Information Steward. 


Restriction: If you currently have modified a person and firm dictionary or created a custom 
dictionary using Universal Data Cleanse, ensure that the Cleansing Package Builder in 
Information Steward is available before you upgrade. You need Cleansing Package Builder to 
migrate your dictionary rules, and reference files to the new cleansing package format. 


International Phone Parsing Enhancements 


There is anew Phone Options group in Data Cleanse Options that consists of the following 
parameters: 


e ISO2 Country Code Sequence 

e North American Phone Parens Area 

e North American Phone Delimiter After Area 
e North American Phone Delimiter 

e Phone Extention Text 


The ISO2 Country Code Sequence parameter is used to create a series of country codes 
(separated with "|" pipes) in the order you want Data Cleanse to parse phone information. 


Casing Results Improvement 


Cleansing package dictionaries can be used to adjust casing. If you use mixed case, the 
general rule is to capitalize the first letter of the word and put the rest of the word in 
lowercase. However, there are exceptions to that rule, such as McDonald, Ph.D., IBM, NY, and 
so on. 


To handle mixed-case exceptions, Data Cleanse consults secondary information standards in 
the dictionary. The dictionary contains the correct casing of a word and also indicates when 
that casing should be used. For example, the entry MS is cased differently depending on how 
it is used. M.S. is an abbreviation for the honorary postname Master of Science and Ms. is a 
prename. The dictionary indicates which formatting to use based on the content type. 


Most Data Cleanse users find that the default capitalization of words in the dictionary is 
sufficient for producing good mixed-case results. However, it is impossible for the default 
dictionary to contain every mixed-case exception. If Data Cleanse does not case a word as 
you want, you can create a custom standard in the dictionary. 
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